Pitch Prediction Filters in Speech Coding RAVI
نویسنده
چکیده
Prediction error filters which combine short-time prediction (formant prediction) with long-time prediction (pitch prediction) in a cascade connection are examined. A number of different solution methods (autocorrelation, covariance, Burg) and implementations (transversal and lattice) are considered. It is found that the F-P cascade (formant filter before the pitch filter) outperforms the P-F cascade for both transversaland lattice-structured predictors. The performances of the transversal and lattice forms are similar. The solution method that yields a transversal structure requires a stability test and, if necessary, a consequent stabilization. The lattice form allows for a solution method which ensures a stable synthesis filter. Simplified solution methods are shown to be applicable for the pitch filter (multitap case) in an F-P cascade. Furthermore, new methods to estimate the appropriate pitch lag for a pitch filter are proposed for both transversal and lattice structures. These methods perform essentially as well as an exhaustive search in an F-P cascade. Finally, the two cascade forms are implemented as part of an APC coder to evaluate their relative subjective performance. I N this paper, speech coder configurations which use two nonrecursive prediction error filters to process the incoming speech signal are examined. Conventionally, the prediction is carried out as a cascade of two separate filtering operations. The first filter, referred to here as the formant filter, removes near-sample redundancies. The second is termed the pitch filter and acts on distant-sample waveform similarities. The resulting residual signal is quantized and coded for transmission. In an adaptive predictive coder (APC), these predictors are placed in a feedback loop around the quantizer. An additional quantization noise shaping filter can be employed to reduce the perceived distortion in the decoded speech [ I ] , [2]. An alternative description of an APC coder uses an open-loop predictor configuration and a noise feedback filter [3]. A block diagram of such a configuration is shown in Fig. 1. This type of open-loop arrangement is also used in codeexcited linear prediction (CELP) [4]. In CELP, the coding is accomplished by selecting a waveform from a given repertoire of waveforms. The selection process uses an analysis-by-synthesis strategy. Conceptually, each candidate waveform is passed through the synthesis filters to find that one which produces the best quality speech. Manuscript received June 10, 1987; revised August 30, 1988. This work was supported by the Natural Sciences and Engineering Research Council of Canada. R. P. Ramachandran is with the Department of Electrical Engineering, McGill University, Montreal, P.Q., Canada H3A 2A7. P. Kabal is with the Department of Electrical Engineering, McGill University, Montreal, P .Q. , Canada H3A 2A7 and INRS-Telecommunications, UniversitC du QuCbec, Verdun, P.Q., Canada H3E 1H6. IEEE Log Number 88261 13. Fig. 1. Block diagram of an APC coder with noise feedback. (a) Analysis phase. (b) Synthesis phase. Noise shaping is accomplished by including a frequency weighting in the error criterion which is used to choose the best waveform. In both APC and CELP, the residual signal or the selected codeword (after scaling by the gain factor) is passed through a pitch synthesis and a formant synthesis filter to reproduce the decoded speech. The filtering in the synthesis phase can be viewed in the frequency domain as firbt inserting the fine pitch structure and then shaping the spectral envelope (formant structure). The analysis to determine the predictor coefficients is carried out frame by frame. The filter coefficients are then coded for transmission. The quantization of these coefficients is outside the scope of the present study. These parameters, along with the quantized excitation information, are used by the decoder to reconstruct the speech. The frame update rate is chosen to be slow enough to keep the transmission rate required small, yet fast enough to allow the speech segment under analysis to be adequately described by a set of constant parameters. Depending on the application, the effective frame size usually corresponds to time intervals between 5 and 20 ms. The aim of this paper is to study predictors which incorporate both short-time and long-time prediction. The effect of the ordering of the prediction filters in the cascade connection is considered. The filters will be implemented in both lattice and transversal forms. In addition, methods to determine the lag used for the pitch filter will be derived. The two predictor configurations incorporating the transversal and lattice solutions are tested as part of an APC coder that is equivalent to the one shown in Fig. 1. This allows us to access the relative perceptual quality of the decoded speech that results from the use of different configurations and solutions. The next section will introduce the different configurations for formant and pitch filters. This is followed by an 0096-35 l8/89/O4OO-O467$O 1 .OO @ 1989 IEEE 468 IEEE TRANSACTIONS ON ACOUSTICS, SPEECH. A N D SIGNAL PROCESSING. VOL. 37. NO. 4. APRIL 1989 analysis of a prediction error filter which uses general delays. This general structure subsumes both formant and pitch filters and allows for both autocorrelation and covariance analyses. The following section makes the analysis specific to pitch filters. A comparison of the techniques is given in Section V. Then, the stability properties of the synthesis filters are examined for different configurations. Section VII examines means to determine an appropriate lag for the pitch filter. Finally, Section VIII discusses the relative performance of the different options when implemented as part of a speech coder. 11. FORMANT AND PITCH PREDICTORS The conventional formant predictor has a transfer function
منابع مشابه
Stability and performance analysis of pitch filters in speech coders
This paper analyzes the stability and performance of pitch filters in speech coding when pitch prediction is combined with formant prediction. A computationally simple stability test based on a sufficient condition is formulated for pitch synthesis filters. For typical orders of pitch filters, this sufficient test is very tight. Based on the test, a simple stabilization technique that minimizes...
متن کاملSpeech Coding at 4.8 kb/s with an Improved Pitch Filter
The reconstructed speech quality in a low bit-rate CELP coder is very dependent on the performance of the pitch filter. In this paper, we present an improved pitch filter, a fractional pseudethree-tap pitch synthesis filter, which performs better than a conventional one-tap pitch filter. We discuss the frequency response of the improved pitch filter. We explore stability issues for three-tap pi...
متن کاملAdaptive Linear Prediction in Speech Coding
Adaptive linear prediction is commonly used as a key step in digital coding of speech. This paper discusses some of the techniques that have been developed for adapting and coding the predictor coefficients in speech coders. The linear predictors in high quality speech coding often consist of two stages, a short-time span (formant) filter and a long-time span (pitch) filter. The use of such fil...
متن کاملJoint Optimization of Linear Predictors in Speech Coders
Low bit rate speech coders often employ both formant and pitch predictors to remove near-sample and distant-sample redundancies in the speech signal. The coefficients of these predictors are usually determined for one prediction filter and then for the other (a sequential solution). This paper deals with formant and pitch predictors which are jointly optimized. The first configuration considere...
متن کاملAnalysis by synthesis speech coding with generalized pitch prediction
A new analysis-by-synthesis speech coding structure is presented for high-quality speech coding in the 4 to 8 kb/s range. CELP with generalized pitch prediction (GPP-CELP) di ers from classical code-excited linear prediction (CELP) in that for voiced segments it is the speech signal that is decomposed into a component predictable with the aid of the adaptive codebook (ACB) and a nonpredictable ...
متن کامل